Search CORE

175 research outputs found

Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm

Author: Durmus Alain
Moulines Eric
Publication venue
Publication date: 19/12/2016
Field of study

In this paper, we study a method to sample from a target distribution

\pi

over

\mathbb{R}^d

having a positive density with respect to the Lebesgue measure, known up to a normalisation factor. This method is based on the Euler discretization of the overdamped Langevin stochastic differential equation associated with

\pi

. For both constant and decreasing step sizes in the Euler discretization, we obtain non-asymptotic bounds for the convergence to the target distribution

\pi

in total variation distance. A particular attention is paid to the dependency on the dimension

d

, to demonstrate the applicability of this method in the high dimensional setting. These bounds improve and extend the results of (Dalalyan 2014)

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Polytechnique

High-dimensional Bayesian inference via the Unadjusted Langevin Algorithm

Author: Durmus Alain
Moulines Eric
Publication venue
Publication date: 15/07/2018
Field of study

We consider in this paper the problem of sampling a high-dimensional probability distribution

\pi

having a density with respect to the Lebesgue measure on

\mathbb{R}^d

, known up to a normalization constant

x \mapsto \pi(x)= \mathrm{e}^{-U(x)}/\int_{\mathbb{R}^d} \mathrm{e}^{-U(y)} \mathrm{d} y

. Such problem naturally occurs for example in Bayesian inference and machine learning. Under the assumption that

U

is continuously differentiable,

\nabla U

is globally Lipschitz and

U

is strongly convex, we obtain non-asymptotic bounds for the convergence to stationarity in Wasserstein distance of order

2

and total variation distance of the sampling method based on the Euler discretization of the Langevin stochastic differential equation, for both constant and decreasing step sizes. The dependence on the dimension of the state space of these bounds is explicit. The convergence of an appropriately weighted empirical measure is also investigated and bounds for the mean square error and exponential deviation inequality are reported for functions which are measurable and bounded. An illustration to Bayesian inference for binary regression is presented to support our claims.Comment: Supplementary material available at https://hal.inria.fr/hal-01176084/. arXiv admin note: substantial text overlap with arXiv:1507.0502

arXiv.org e-Print Archive

HAL-Polytechnique

Bridging the Gap between Constant Step Size Stochastic Gradient Descent and Markov Chains

Author: Bach Francis
Dieuleveut Aymeric
Durmus Alain
Publication venue
Publication date: 10/04/2018
Field of study

We consider the minimization of an objective function given access to unbiased estimates of its gradient through stochastic gradient descent (SGD) with constant step-size. While the detailed analysis was only performed for quadratic functions, we provide an explicit asymptotic expansion of the moments of the averaged SGD iterates that outlines the dependence on initial conditions, the effect of noise and the step-size, as well as the lack of convergence in the general (non-quadratic) case. For this analysis, we bring tools from Markov chain theory into the analysis of stochastic gradient. We then show that Richardson-Romberg extrapolation may be used to get closer to the global optimum and we show empirical improvements of the new extrapolation scheme

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Analysis of Langevin Monte Carlo via convex optimization

Author: Durmus Alain
Majewski Szymon
Miasojedow Błażej
Publication venue
Publication date: 28/03/2018
Field of study

In this paper, we provide new insights on the Unadjusted Langevin Algorithm. We show that this method can be formulated as a first order optimization algorithm of an objective functional defined on the Wasserstein space of order

2

. Using this interpretation and techniques borrowed from convex optimization, we give a non-asymptotic analysis of this method to sample from logconcave smooth target distribution on

\mathbb{R}^d

. Based on this interpretation, we propose two new methods for sampling from a non-smooth target distribution, which we analyze as well. Besides, these new algorithms are natural extensions of the Stochastic Gradient Langevin Dynamics (SGLD) algorithm, which is a popular extension of the Unadjusted Langevin Algorithm. Similar to SGLD, they only rely on approximations of the gradient of the target log density and can be used for large-scale Bayesian inference

arXiv.org e-Print Archive

HAL-Polytechnique

Copula-like Variational Inference

Author: Dellaportas Petros
Durmus Alain
Hirt Marcel
Publication venue
Publication date: 01/01/2019
Field of study

This paper considers a new family of variational distributions motivated by Sklar's theorem. This family is based on new copula-like densities on the hypercube with non-uniform marginals which can be sampled efficiently, i.e. with a complexity linear in the dimension of state space. Then, the proposed variational densities that we suggest can be seen as arising from these copula-like densities used as base distributions on the hypercube with Gaussian quantile functions and sparse rotation matrices as normalizing flows. The latter correspond to a rotation of the marginals with complexity

\mathcal{O}(d \log d)

. We provide some empirical evidence that such a variational family can also approximate non-Gaussian posteriors and can be beneficial compared to Gaussian approximations. Our method performs largely comparably to state-of-the-art variational approximations on standard regression and classification benchmarks for Bayesian Neural Networks.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canad

arXiv.org e-Print Archive

UCL Discovery

HAL-Polytechnique

Sampling from a log-concave distribution with compact support with proximal Langevin Monte Carlo

Author: Brosse Nicolas
Durmus Alain
Moulines Éric
Pereyra Marcelo
Publication venue
Publication date: 01/01/2017
Field of study

This paper presents a detailed theoretical analysis of the Langevin Monte Carlo sampling algorithm recently introduced in Durmus et al. (Efficient Bayesian computation by proximal Markov chain Monte Carlo: when Langevin meets Moreau, 2016) when applied to log-concave probability distributions that are restricted to a convex body

\mathsf{K}

. This method relies on a regularisation procedure involving the Moreau-Yosida envelope of the indicator function associated with

\mathsf{K}

. Explicit convergence bounds in total variation norm and in Wasserstein distance of order

1

are established. In particular, we show that the complexity of this algorithm given a first order oracle is polynomial in the dimension of the state space. Finally, some numerical experiments are presented to compare our method with competing MCMC approaches from the literature

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Polytechnique